在该项目中,你将使用生成式对抗网络(Generative Adversarial Nets)来生成新的人脸图像。
该项目将使用以下数据集:
由于 CelebA 数据集比较复杂,而且这是你第一次使用 GANs。我们想让你先在 MNIST 数据集上测试你的 GANs 模型,以让你更快的评估所建立模型的性能。
如果你在使用 FloydHub, 请将 data_dir 设置为 "/input" 并使用 FloydHub data ID "R5KrjnANiKVhLWAkpXhNBe".
data_dir = './data'
# FloydHub - Use with data ID "R5KrjnANiKVhLWAkpXhNBe"
#data_dir = '/input'
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper
helper.download_extract('mnist', data_dir)
helper.download_extract('celeba', data_dir)
show_n_images = 25
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
%matplotlib inline
import os
from glob import glob
from matplotlib import pyplot
mnist_images = helper.get_batch(glob(os.path.join(data_dir, 'mnist/*.jpg'))[:show_n_images], 28, 28, 'L')
pyplot.imshow(helper.images_square_grid(mnist_images, 'L'), cmap='gray')
CelebFaces Attributes Dataset (CelebA) 是一个包含 20 多万张名人图片及相关图片说明的数据集。你将用此数据集生成人脸,不会用不到相关说明。你可以更改 show_n_images 探索此数据集。
show_n_images = 25
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
mnist_images = helper.get_batch(glob(os.path.join(data_dir, 'img_align_celeba/*.jpg'))[:show_n_images], 28, 28, 'RGB')
pyplot.imshow(helper.images_square_grid(mnist_images, 'RGB'))
由于该项目的重点是建立 GANs 模型,我们将为你预处理数据。
经过数据预处理,MNIST 和 CelebA 数据集的值在 28×28 维度图像的 [-0.5, 0.5] 范围内。CelebA 数据集中的图像裁剪了非脸部的图像部分,然后调整到 28x28 维度。
MNIST 数据集中的图像是单通道的黑白图像,CelebA 数据集中的图像是 三通道的 RGB 彩色图像。
你将通过部署以下函数来建立 GANs 的主要组成部分:
model_inputsdiscriminatorgeneratormodel_lossmodel_opttrain检查你是否使用正确的 TensorFlow 版本,并获取 GPU 型号
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from distutils.version import LooseVersion
import warnings
import tensorflow as tf
# Check TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.0'), 'Please use TensorFlow version 1.0 or newer. You are using {}'.format(tf.__version__)
print('TensorFlow Version: {}'.format(tf.__version__))
# Check for a GPU
if not tf.test.gpu_device_name():
warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
部署 model_inputs 函数以创建用于神经网络的 占位符 (TF Placeholders)。请创建以下占位符:
image_width,image_height 和 image_channels 设置为 rank 4。z_dim。返回占位符元组的形状为 (tensor of real input images, tensor of z data, learning rate)。
import problem_unittests as tests
def model_inputs(image_width, image_height, image_channels, z_dim):
"""
Create the model inputs
:param image_width: The input image width
:param image_height: The input image height
:param image_channels: The number of image channels
:param z_dim: The dimension of Z
:return: Tuple of (tensor of real input images, tensor of z data, learning rate)
"""
# TODO: Implement Function
input_real = tf.placeholder(tf.float32, (None, image_width, image_height, image_channels), name = 'input_real')
input_z = tf.placeholder(tf.float32, (None, z_dim), name='input_z')
learn_rate = tf.placeholder(tf.float32, name = 'learn_rate')
return input_real, input_z, learn_rate
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_model_inputs(model_inputs)
部署 discriminator 函数创建辨别器神经网络以辨别 images。该函数应能够重复使用神经网络中的各种变量。 在 tf.variable_scope 中使用 "discriminator" 的变量空间名来重复使用该函数中的变量。
该函数应返回形如 (tensor output of the discriminator, tensor logits of the discriminator) 的元组。
def discriminator(images, reuse=False):
"""
Create the discriminator network
:param image: Tensor of input image(s)
:param reuse: Boolean if the weights should be reused
:return: Tuple of (tensor output of the discriminator, tensor logits of the discriminator)
"""
# TODO: Implement Function
# 参考:DCGAN paper - https://arxiv.org/pdf/1511.06434.pdf
# 参考:https://github.com/udacity/cn-deep-learning/blob/master/tutorials/dcgan-svhn/DCGAN.ipynb
alpha = 0.2
with tf.variable_scope('discriminator', reuse=reuse):
'''
# Input layer is 28x28x3
x1 = tf.layers.conv2d(images, 64, 5, strides=2, padding='same')
relu1 = tf.maximum(alpha * x1, x1)
# 14x14x64
x2 = tf.layers.conv2d(relu1, 128, 5, strides=2, padding='same')
bn2 = tf.layers.batch_normalization(x2, training=True)
relu2 = tf.maximum(alpha * bn2, bn2)
# 7x7x128
x3 = tf.layers.conv2d(relu2, 256, 5, strides=2, padding='same')
bn3 = tf.layers.batch_normalization(x3, training=True)
relu3 = tf.maximum(alpha * bn3, bn3)
# 4x4x256
'''
'''review suggestion
如果要进一步的提升模型的效率,可以使用0.8概率的dropout和用Xavier初始化权重。
Xavier可以通过下面的方法实现,在tf.layers.conv2d中,将-tf.contrib.layers.xavier_initializer()
传入作为kernel_initializerparameter的值。
Xavier可以加速你的模型收敛的过程,因为这个例子中,我们故意选择了较小的epoch,鼓励你们能够优化模型达到不错的效果,
所以如果使用了更好的权重初始化方法,就由可能获得更好的结果。
其次,Xavier有可能增加模型收敛到更低的loss的可能性。
'''
# Input layer is 28x28x3
x1 = tf.layers.conv2d(images, 64, 5, strides=2, padding='same',
kernel_initializer=tf.contrib.layers.xavier_initializer())
relu1 = tf.maximum(alpha * x1, x1)
x1 = tf.nn.dropout(x1, keep_prob=0.8)
# 14x14x64
x2 = tf.layers.conv2d(x1, 128, 5, strides=2, padding='same',
kernel_initializer=tf.contrib.layers.xavier_initializer())
bn2 = tf.layers.batch_normalization(x2, training=True)
relu2 = tf.maximum(alpha * bn2, bn2)
x2 = tf.nn.dropout(x2, keep_prob=0.8)
# 7x7x128
x3 = tf.layers.conv2d(x2, 256, 5, strides=2, padding='same',
kernel_initializer=tf.contrib.layers.xavier_initializer())
bn3 = tf.layers.batch_normalization(x3, training=True)
relu3 = tf.maximum(alpha * bn3, bn3)
x3 = tf.nn.dropout(x3, keep_prob=0.8)
# 4x4x256
# Flatten it
flat = tf.reshape(relu3, (-1, 4*4*256))
logits = tf.layers.dense(flat, 1)
out = tf.sigmoid(logits)
return out, logits
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_discriminator(discriminator, tf)
部署 generator 函数以使用 z 生成图像。该函数应能够重复使用神经网络中的各种变量。
在 tf.variable_scope 中使用 "generator" 的变量空间名来重复使用该函数中的变量。
该函数应返回所生成的 28 x 28 x out_channel_dim 维度图像。
def generator(z, out_channel_dim, is_train=True):
"""
Create the generator network
:param z: Input z
:param out_channel_dim: The number of channels in the output image
:param is_train: Boolean if generator is being used for training
:return: The tensor output of the generator
"""
# TODO: Implement Function
# 参考:DCGAN paper - https://arxiv.org/pdf/1511.06434.pdf
# 参考:https://github.com/udacity/cn-deep-learning/blob/master/tutorials/dcgan-svhn/DCGAN.ipynb
alpha = 0.2
with tf.variable_scope('generator', reuse=not is_train):
# First fully connected layer
x1 = tf.layers.dense(z, 2*2*512)
# Reshape it to start the convolutional stack
x1 = tf.reshape(x1, (-1, 2, 2, 512))
x1 = tf.layers.batch_normalization(x1, training=is_train)
x1 = tf.maximum(alpha * x1, x1)
x1 = tf.nn.dropout(x1, 0.8)
# 2x2x512 now
x2 = tf.layers.conv2d_transpose(x1, 256, 5, strides=2, padding='valid')
x2 = tf.layers.batch_normalization(x2, training=is_train)
x2 = tf.maximum(alpha * x2, x2)
x2 = tf.nn.dropout(x2, 0.8)
# 7x7x256
x3 = tf.layers.conv2d_transpose(x2, 128, 5, strides=2, padding='same')
x3 = tf.layers.batch_normalization(x3, training=is_train)
x3 = tf.maximum(alpha * x3, x3)
x3 = tf.nn.dropout(x3, 0.8)
# 14x14x128 now
# Output layer
logits = tf.layers.conv2d_transpose(x3, out_channel_dim, 5, strides=2, padding='same')
# 28x28x3(channel)now
out = tf.tanh(logits)
return out
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_generator(generator, tf)
部署 model_loss 函数训练并计算 GANs 的损失。该函数应返回形如 (discriminator loss, generator loss) 的元组。
使用你已实现的函数:
discriminator(images, reuse=False)generator(z, out_channel_dim, is_train=True)def model_loss(input_real, input_z, out_channel_dim):
"""
Get the loss for the discriminator and generator
:param input_real: Images from the real dataset
:param input_z: Z input
:param out_channel_dim: The number of channels in the output image
:return: A tuple of (discriminator loss, generator loss)
"""
# TODO: Implement Function
# 参考:https://github.com/udacity/cn-deep-learning/blob/master/tutorials/dcgan-svhn/DCGAN.ipynb
g_model = generator(input_z, out_channel_dim, is_train=True)
"""review suggestion
为了防止discriminator太强,同时也为了让它能够更有泛化能力,一般会将disc_label_real乘以0.9。这叫做标签的(单侧)平滑化。
也可以通过labels = tf.ones_like(tensor) * (1 - smooth) 来实现。
"""
d_model_real, d_logits_real = discriminator(input_real, reuse=False)
d_model_fake, d_logits_fake = discriminator(g_model, reuse=True)
smooth=0.1
d_loss_real = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_real, labels=tf.ones_like(d_model_real) * (1 - smooth)))
d_loss_fake = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_fake, labels=tf.zeros_like(d_model_fake)))
g_loss = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_fake, labels=tf.ones_like(d_model_fake)))
d_loss = d_loss_real + d_loss_fake
return d_loss, g_loss
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_model_loss(model_loss)
部署 model_opt 函数实现对 GANs 的优化。使用 tf.trainable_variables 获取可训练的所有变量。通过变量空间名 discriminator 和 generator 来过滤变量。该函数应返回形如 (discriminator training operation, generator training operation) 的元组。
def model_opt(d_loss, g_loss, learning_rate, beta1):
"""
Get optimization operations
:param d_loss: Discriminator loss Tensor
:param g_loss: Generator loss Tensor
:param learning_rate: Learning Rate Placeholder
:param beta1: The exponential decay rate for the 1st moment in the optimizer
:return: A tuple of (discriminator training operation, generator training operation)
"""
# TODO: Implement Function
# 参考:https://github.com/udacity/cn-deep-learning/blob/master/tutorials/dcgan-svhn/DCGAN.ipynb
t_vars = tf.trainable_variables()
d_vars = [var for var in t_vars if var.name.startswith('discriminator')]
g_vars = [var for var in t_vars if var.name.startswith('generator')]
# Optimize
with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
d_train_opt = tf.train.AdamOptimizer(learning_rate, beta1=beta1).minimize(d_loss, var_list=d_vars)
g_train_opt = tf.train.AdamOptimizer(learning_rate, beta1=beta1).minimize(g_loss, var_list=g_vars)
return d_train_opt, g_train_opt
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_model_opt(model_opt, tf)
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import numpy as np
def show_generator_output(sess, n_images, input_z, out_channel_dim, image_mode):
"""
Show example output for the generator
:param sess: TensorFlow session
:param n_images: Number of Images to display
:param input_z: Input Z Tensor
:param out_channel_dim: The number of channels in the output image
:param image_mode: The mode to use for images ("RGB" or "L")
"""
cmap = None if image_mode == 'RGB' else 'gray'
z_dim = input_z.get_shape().as_list()[-1]
example_z = np.random.uniform(-1, 1, size=[n_images, z_dim])
samples = sess.run(
generator(input_z, out_channel_dim, False),
feed_dict={input_z: example_z})
images_grid = helper.images_square_grid(samples, image_mode)
pyplot.imshow(images_grid, cmap=cmap)
pyplot.show()
部署 train 函数以建立并训练 GANs 模型。记得使用以下你已完成的函数:
model_inputs(image_width, image_height, image_channels, z_dim)model_loss(input_real, input_z, out_channel_dim)model_opt(d_loss, g_loss, learning_rate, beta1)使用 show_generator_output 函数显示 generator 在训练过程中的输出。
注意:在每个批次 (batch) 中运行 show_generator_output 函数会显著增加训练时间与该 notebook 的体积。推荐每 100 批次输出一次 generator 的输出。
def train(epoch_count, batch_size, z_dim, learning_rate, beta1, get_batches, data_shape, data_image_mode):
"""
Train the GAN
:param epoch_count: Number of epochs
:param batch_size: Batch Size
:param z_dim: Z dimension
:param learning_rate: Learning Rate
:param beta1: The exponential decay rate for the 1st moment in the optimizer
:param get_batches: Function to get batches
:param data_shape: Shape of the data
:param data_image_mode: The image mode to use for images ("RGB" or "L")
"""
# TODO: Build Model
# 参考:https://github.com/udacity/cn-deep-learning/blob/master/tutorials/dcgan-svhn/DCGAN.ipynb
samples, image_width, image_height, image_channels = data_shape
input_real, input_z, learn_rate = model_inputs(image_width, image_height, image_channels, z_dim)
d_loss, g_loss = model_loss(input_real, input_z, image_channels)
d_opt, g_opt = model_opt(d_loss, g_loss, learn_rate, beta1)
steps = 0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch_i in range(epoch_count):
for batch_images in get_batches(batch_size):
# TODO: Train Model
steps += 1
"""review suggestion
这里由于generator的输出应用了tanh,tanh函数输出在-1到1之间,但是batch_images的范围在-0.5到0.5之间,
所以这个地方需要将real image的范围rescale到-1到1之间,这里可以通过batch_images = batch_images*2来实现,
这样给discriminator传入的real image和generator的fake image就在相同的范围了
"""
batch_images *= 2
batch_z = np.random.uniform(-1, 1, size = [batch_size, z_dim])
feed_dict = {input_real: batch_images,input_z: batch_z,learn_rate: learning_rate}
# Run optimizers
_ = sess.run(d_opt, feed_dict = feed_dict)
_ = sess.run(g_opt, feed_dict = feed_dict)
print_every = 10
if steps % print_every == 0:
# 在每个epoch之后得到并打印losses
train_loss_d = d_loss.eval({input_z: batch_z, input_real: batch_images})
train_loss_g = g_loss.eval({input_z: batch_z})
print("Epoch {}/{}...".format(epoch_i+1, epoch_count),
"Discriminator Loss: {:.4f}...".format(train_loss_d),
"Generator Loss: {:.4f}".format(train_loss_g))
show_every = 100
if steps % show_every == 0:
show_generator_output(sess, 25, input_z, image_channels, data_image_mode)
在 MNIST 上测试你的 GANs 模型。经过 2 次迭代,GANs 应该能够生成类似手写数字的图像。确保生成器 (generator) 低于辨别器 (discriminator) 的损失,或接近 0。
"""review suggestion
注意我们的参数最好设置成2的倍数,比如4、8、16、32、64。这样可以让tensorflow在计算的时候进行优化,
让模型训练更加迅速。Batch size 主要影响的是你GAN生成的图片质量,下面给一些关于参数设置的建议:
● 对于celeA这个数据集来说,由于它包含了许多大图像,所以Batch size设置为16或者32比较合适。
● 对于MNIST这个数据集来说,图像相对较小,只是28 * 28 的黑白色图形,所以Batch size 设置为32 或者64。
● 在GAN中,learning rate 设置为0.0002应该不错,但是有些稍微提高一点能够有效地减少你训练的时间(0.001左右)。
● Beta1 在0.5或0.4左右的话也不错。
"""
batch_size = 64
z_dim = 128
learning_rate = 0.0002
beta1 = 0.5
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
epochs = 2
mnist_dataset = helper.Dataset('mnist', glob(os.path.join(data_dir, 'mnist/*.jpg')))
with tf.Graph().as_default():
train(epochs, batch_size, z_dim, learning_rate, beta1, mnist_dataset.get_batches,
mnist_dataset.shape, mnist_dataset.image_mode)
在 CelebA 上运行你的 GANs 模型。在一般的GPU上运行每次迭代大约需要 20 分钟。你可以运行整个迭代,或者当 GANs 开始产生真实人脸图像时停止它。
batch_size = 32
z_dim = 128
learning_rate = 0.0002
beta1 = 0.5
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
epochs = 1
celeba_dataset = helper.Dataset('celeba', glob(os.path.join(data_dir, 'img_align_celeba/*.jpg')))
with tf.Graph().as_default():
train(epochs, batch_size, z_dim, learning_rate, beta1, celeba_dataset.get_batches,
celeba_dataset.shape, celeba_dataset.image_mode)
提交本项目前,确保运行所有 cells 后保存该文件。
保存该文件为 "dlnd_face_generation.ipynb", 并另存为 HTML 格式 "File" -> "Download as"。提交项目时请附带 "helper.py" 和 "problem_unittests.py" 文件。